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Abstract 

We outline a model for a cognitive epigenetic system based on elements of the Shannon theory of information and the 
statistical physics of the generalized Onsager relations. Particular attention is paid to the concept of the rate distortion 
function and from another direction as motivated by the thermodynamics of computing, the fundamental homology with 
the free energy density of a physical system. A unifying aspect of the dynamic framework involves the concept of a 
groupoid and of a groupoid atlas. From a stochastic differential equation we postulate a multidimensional Ito process for 
an epigenetic system from which a stochastic flow may permeate through components of this atlas. 
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00 1 Introduction 

On ■ 

Living systems as far-from-equilibrium open systems, are essentially cognitive, and conversely, in order to grasp the essence 
y—i of cognition, one may attempt to understand the ontology of the former processes. Often this is viewed in the framework of 
cell-like, structured, self-organizing system engaged in a two-way interaction between its functional mechanisms and that 
] of its neighboring environment. There are several theories that revolve around this principle. Readers may be familiar with a 
. . pentral relationship as explained by autopoiesis, a seemingly enduring theory as developed by Maturana and Varela (1980a, 
^ i980b) over several decades. Though a much earlier and somewhat different hypothesis of Bertalanffy (1972) had proposed 
Jv> that living systems are maintained in non-equilibrium states by a flow-like patterns when drawing matter and energy from 
5^ their environment, and adjust accordingly in a "flowing balance" (Bertalanffy, 1972; Capra, 1996). Further scientific rigor 
pi aimed at understanding this hypothesis was achieved by Prigogine (1980) who formulated similar ideas cast within a theory 
of dissipative structures. Whereas living systems are continuously maintained in-far-from equilibrium, dissipative structures 
do likewise while being capable of evolving. They are destabilized in the increase of information and energy, though as they 
remain self-organized and self-perpetuating, the complexity of their structure increases; often this is simply for the sake 
of survival within the environment. Elucidating a possible synthesis of these separate approaches, commencing from the 
Bertalanffy hypothesis, is the main topic of Capra (1996) . But cognition is also a function of a prevailing culture: by means 
of social and historical patterns of behavior traditions, etc., human cognition feeds back into that culture through the course 
of social interaction, adaptive technologies, policies, memetic trends, etc., and inevitably alters it in time (see e.g. Clark, 
1997; HoUan et al., 2000; Hutchins, 1994; Richerson and Boyd, 2004; Wallace and Fulhlove, 2008). 

There are several common factors at stake here, and in seeking to understand these, the 'immunology-language' viewpoint 
of Atlan and Cohen (1998) (see also Cohen, 2000) views human organizations at all levels as perceiving patterns of threat 
or opportunity, comparing those patterns with some internal, learned or inherited, picture of the world, and then choosing 
one or a small number of responses from a vastly larger repertory of that which is possible to them. Putting it another way. 
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consider a basic observation concerning the immune system: the latter incorporates its own system of options and responds 
cognitively at the level of information processing such that the meaning of an antigen is defined by the response of the immune 
system, somewhat reminiscent of earlier work of Jerne (1974) who had postulated such 'meaning'. The broader cognitive 
model that results from this can said to be a 'reactive' system determined by contextual factors in Cohen and Harel (2007) 
as depicted in Figure 1. 

Starting from this basic perspective, Wallace (2005) has formulated a model of cognition that takes 'meaning' in the 
sense of Dretske's theory of semantic communication (Dretske, 1981, 1988) claiming that the immune perception/response 
information networks of any cognitive system must be constrained by Shannon's fundamental limit theorems of information 
theory (see e.g. Ash, 1990; Berger, 1971; Cover and Thomas, 1991). These networks comprising of cognitive modules interact 
within the framework of a kind of broadcasting system relaying within 'a theater of consciousness' - the basic operative 
hypothesis of the Global (Neuronal) Workspace theory as developed in (Baars, 1988; Baars and Franklin, 2003) that forms 
an integral part of the model in (Wallace, 2005; Wallace and Fullilove, 2008). 

Here we take a further step forward that incorporates a number of related factors mainly following the (generalized) 
Onsager relations of non-equilibrium thermodynamics combined with the principles of rate distortion theory which is one 
of the mainstays of Shannon's pioneering work. One particular observation is that distortion in communication between 
interacting cognitive modules, is patently stochastic, in particular, it is manifestly a process of Brownian motion. This 
observation is developed in several steps relative to a corresponding stochastic differential equation that eventually unfolds 
to a kind of master equation. By considering critical solutions for the distortion, we argue on a case-by-case basis that 
the ensuing Brownian motion (viz Weiner process) can be locally one of bounded variation. Accordingly, we are motivated 
to introduce the necessary conditions that permit integration of these equations in order to obtain a multidimensional ltd 
process] further we discuss those formal conditions that describe an associated stochastic solution fiow. Thus in a way we 
have recovered BertalanfFy's perception of the "fiow" of living systems, the kind of which information-tied epigenetic systems 
are certain representations. 

Whereas such model conditions are mathematically stated for a flow in a smooth manifold geometry, we do not claim 
such 'smoothness' in general. Indeed, such epigenetic 'flow systems' in living organisms and brain-environment interactions 
may be at best continuous over time and are most likely to be 'singular' in some sense. The latter requires harder topological 
techniques and possible complications that detract from a more natural interpretation. Eventually one seeks a workable 
common ground between various mathematical abstractions and the actual empirical properties of the systems themselves. 

2 The basis of the epigenetic model 
2.1 Gene-environment interaction 

Another slant emerging from this cognitive paradigm goes towards epigenetic information sources as providing a tunable 
catalyst directed to gene expression by which the embedding of information sources can direct developmental pathways 
within the ontology of enveloped structures via the process of information (Wallace and Wallace, 2008). This leads to further 
developments of several cognitive paradigms for gene expression, in part furthering the scientific reason underlying that of 
Jablonka and Lamb (2005) who consider epigenetic inheritance systems in which information may be transmitted through 
generations, not just simply through the base sequence of DNA, but also transmitted via cultural and behavioral means in 
higher animals, and by epigenetic means in cell lineages. This is initiated by memory systems that enable somatic cells of 
differing phenotype, but of identical genotype, to transmit their phenotypes to their descendants, even in the absence of the 
original stimuli that had engaged these phenotypes. 

Conditional upon an individual's phenotype, environmental factors may trigger alterations in behavior and health, even- 
tually impinging upon the nervous system with the likely consequence of mental disarray. In this respect Moffitt et al. (2006) 
(cf Caspi and Moffitt, 2006) make several observations: (1) a heritable versus environmental infiuence on phenotype varia- 
tion across a given environment, (2) altered gene expressions via epigenetic programming geared in response to subsequent 
health-behavioral reactions towards the environment, (3) how an individual's phenotype determines a risk-level towards the 
environment, and (4) behavioral effects due to interdependence between specified variations in the DNA sequence specific to 
a measured environment. Indispensable to understanding and extending these findings is the parallel question of how gene- 
cultural interaction, two distinct but interacting hereditary systems, compares psychopathology across oriental and western 
cultures in basic perceptual processes, often an important piece missing from the larger picture (Nisbett, 2003; Richerson 
and Boyd, 2004; Wallace, 2005; Wallace, 2009). As such distinct interacting systems of information infiuencing action and 
behavior, both kinds (genes and culture) are claimed in Durham (1991) to create a real and unambiguous symmetry: between 
genes and phenotypes on the one hand, and culture and phenotypes on the other, whereby genes and culture may be rea- 
sonably viewed as two parallel lines of hereditary infiuence on phenotypes. From the perspective of (Wallace, 2005; Wallace 
and Wallace, 2008; Wallace and Wallace, 2009) both can be realized as generalized languages in the sense they have their 
own intrinsic recognizable grammar and syntax (see §3.3|) . Likewise, on reflecting upon the fundamental mechanisms of serial 



2 



Figure 1; An (inter) 'reactive' system (as adapted from Fig. 2 p. 177 of Colien and Harel, 2007). We may interpret this as a 
large-scale scheme of interacting and reciprocating processes modeled upon a kind of 'atlas' (see below). 



endosymbiotic theory (see e.g. Margulis, 2004). Witzany (2006) argues a case for extending the latter via biosemiotie cell-cell 
type interactions as 'signed' language-communication processes subject to a range of syntactic, pragmatic and semantic rules 
as applied to protein coding DNA, RNA editing, DNA splicing, transcription and other essential functions. 

2.2 Autopoiesis and the structuralist approach 

A more general perspective is to consider those central relations characterizing the system and its various structural compo- 
nents, and how a state is altered under perturbations by the environment. An autopoietic system is organizationally closed 
and structurally determined. The system's autopoiesis is preserved within the living state, adaptable only to structural 
fluctuations for as long as the living entity survives within and is structurally coupled to its environment; otherwise there 
is termination. The autopoiesis of the nervous system (though itself is not strictly autopoietic) functionally self- (creates) 
replicates in order to engage in cognition. Though structurally dependent, the nervous system affords an innate plasticity 
and with appropriate alterations, it is conducive to learning and can adapt itself towards broader interactions and human 
self-consciousness (Maturana and Varela, 1980a, 1980b). Such broader interpretations of this theory are addressed in Mingers 
(1991), such as the family nexus which via its idiosyncratic behavioral and linguistic interactions (in relationship to hereditary 
factors, socioeconomic status, environment, culture, etc.), creates a peculiar structured reality that for the best part is only 
accessible to the members themselves, and through which a prospective psychotherapist must explore in order to fathom out 
those recurrent patterns of conversation and behavior influencing the cognitive malfunctioning of any concerned (cf Laing 
and Esterson, 1964). 

There are related theories with varying degrees of overlap to the overall concepts of Maturana and Varela (1980a, 1980b). 
One example is a biogenetic-structuralist theory (Laughlin and d'Aquili, 1974) where connections within an evolutionary 
context are made between neural organization/brain-functional activity and the environment. The result is "acquired models 
of reality", again, a cognitive outcome of the response to and engagement with the environment, proposing a concept of 
'neurognosis', a kind of 'holographic' model based upon biogenetically induced rudimentary information embedded in various 
associated regions of the brain, as taken from birth towards the later stages of maturity. The neurognostic model depends 
upon ontogenetic feedback from an external reality according to which acquired components are continually engaged with 
sensory input. Such modes are claimed to be represented in neural processes by probability expectations which we shall see, 
lie at the heart of information theory, and which encode the behavior of these process in relationship to the environment. 
Thus it may be reasonable to suppose that enriched environments of some sort reciprocate to a greater advantage than others, 
and the structures governing the alignment of the self to the environment to an extent evolved in accordance to some degree of 
mastery over the latter. Take for instance, the hypothalamic-pituitary-adrenal (HPA) axis (as governing the neurophysiology 
of the "flight or fight" mechanism) is cognitive in the sense of Atlan and Cohen (1998). If there is an arousal of the individual's 
close environment, then mind, memory and emotional cognition engage, evaluate and select appropriate responses. The HPA 
accelerates this process and possible malfunctioning may induce hyper-reactivity as observed in cases of post-traumatic 
stress disorder. Depression, as another example, may be partly viewed as the evolution of a structure conducive to a negative 
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alignment of the self with an external reality. If we think of a child as gradually mastering its environment via a symbiotic 
relation with its mother, then take away the mother, the inability to further manage the environment activates a negative 
kind of (neurognostic) structure such as those that have been researched in the context of various evolutionary theories 
of depression that are founded upon the occurrence of attachment-defeat-loss, diminished opportunities, down-regulation of 
foraging capability, social/professional rank, etc. incurred with varying risk factors within a culturally influenced environment 
(see e.g. Gilbert, 2006; Moffitt et al., 2006; Wallace, 2009). Another example is to consider the body's blood pressure control 
system consisting as a network of cognitive systems which compare a set of incoming signals with an internal reference 
structure in order to select a suitable level of blood pressure from possible levels; hence as claimed in (Wallace and FuUilove, 
2008; Wallace and Wallace, 2009) an elaborate tumor control strategy must be at least as cognitive as the immune system 
itself. 

2.3 Trail systems and Roman roads 

Thus granted that most (if not all) given classes of cognitive modules interact within their cultural environment, we may 
proceed to consider what happens when the environment is the communication medium itself. For instance, in typical 
AI laboratory models where multi- agent, inter-sensing systems function in local coordinated tasks (often with no explicit 
communication between agents), the eventual net effect may induce a 'shared memory' (Cao et al., 1997; Krieger et al., 
2000) where, for instance, it was claimed that more energetic and efficient foraging tasks were typical of multi-agent (robotic) 
systems compared to individual agents, and the former tended to produce behavioral patterns similar to those of 'ant-like' 
decentralized control systems. On the other hand, biological regulatory networks besides being susceptible to alterations in 
the environment and/or intracellular conditions, may operate stochastically in varying degrees as seen along the pathways 
of neuronal signalling transduction (Manninen et al., 2006) that provides an analogy motivating part of this paper (see 
H6.5I) . A similar scenario is that of neuronal 'Trail Systems' (TS) in Glade et al. (2009): single wire and logical gates in 
a self-organized bioprocessor along which self-propelled particles communicate via traces etched out in the environment, 
thus creating the TS. The claim of Glade et al. (2009) is that such systems, although not precisely defined, are capable 
of programming and function on the basis of a Turing machine. It suggests an evolutionary factor by which various living 
beings, by activating their respective nervous systems, have trained themselves to use models like the TS towards evolution 
within and possible mastery over their environment, by means of simulating 'trail' signals. More from an information-theory 
viewpoint, Wallace (2009b) envisages similar ideas to trail systems as reminiscent of 'Roman roads': decision making and 
tasking within small local communities, eventually creating 'roads', manifestly inter-connected cognitive modules having 
different time constraints, but eventually creating extensions of 'local consciousness'. A rate distortion argument applies here 
to account for the mutual crosstalk between different modules using the homology of the rate distortion function with free 
energy as will be described later. Next we proceed with some specifics. 

3 Rate distortion and source entropy 
3.1 The rate distortion function 

As is well-known, distortion arises when there is a fast relay of information through some channel which exceeds the latter's 
capacity. One of the principles of the Shannon theory is that in order to reproduce a message transmitted from a source to 
a receiver, it is necessary to know what sort of information should be transmitted and how. For the purpose of engineering 
a communication system, one needs to figure out a suitable encoding/decoding system once the nature of the channel is 
specified. Following Berger (1971), we briefly recall some of the basic principles involved. 

Source encoder: We may consider some output x{t) emanating from the source as projected to a finite set of preselected 
images, namely, the space of possible source outputs is partitioned into a set of equivalence classes and the source encoder 
informs the channel encoder of that class containing the particular source output observed. Once the channel encoder is 
informed that the source output belongs to say, the m-th equivalence class, it transforms the corresponding waveform Xm(t) 
across the channel. 

Source decoder: Within the system is a cascade of a channel encoder and a source decoder. The channel decoder 
receives a waveform y(t) of a corresponding function y(t) over some time interval and decides upon the nature of the message 
as transmitted. Then it sends its approximation m' of the message number to the source decoder which in turn creates i/m' (t) 
to register the system's estimate of x{t) over that time interval. Initially, we may think of x{t) and y{t) as 'waveforms', but 
in our case, we consider these as consisting of a language with its own intrinsic grammar/syntax, as well as 'meaning' - to 
be made more specific in §3.31 Analogous considerations apply to the channel signals x{t) and y{t) (see Figure 2). 

One of Shannon's notable results was that a communication system can be designed such that it achieves a level of fidelity 
D once the rate distortion R{D) < C , where C denotes the channel capacity. Putting it another way, if the receiver can 
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Figure 2: Source-receiver encoding-decoding (as adapted from Berger, 1971, Fig. 1.2.1, p. 4). 



tolerate an average amount of distortion D, the rate distortion R{D) is the effective rate at which the source can relay 
information with that level of tolerance. 

The rate at which a source produces information subject to insisting upon perfect reproduction, is the source entropy 
H . Given a distortion measure such that perfect reproduction is assigned zero distortion, then we have i?(0) = H. As D 
increases, R{D) becomes a monotonically decreasing (convex) function which eventually is zero, typically at a maximum 
value for D (see Berger, 1971, Chapter 1). This is a very basic observation, and typically in rate distortion theory one seeks a 
reduction of H by either slowing down the emission of coding, or encoding the relevant languages at a lower rate. In view of 
Shannon's theorem, as long as C > H , we will obtain appropriate fidelity in transmission. Inherent difficulties are clear, since 
the source rate may be corrupted due to low memory and coding congestion, hence the need for a communicating system to 
evolve so as recover the source data at the channel output satisfying the Shannon estimate. 



3.2 Average mutual information 

Having mentioned the rate distortion function R{D) we now follow Berger (1971) to give its specific definition in terms 
of average mutual information (an alternative, and equivalent definition of RiD) and a statement of the Rate Distortion 
Theorem will be given in Appendix IH]) . Firstly, for k,j running over a suitable alphabet, let us write a given conditional 
probability assignment as Q{k\j) such that in the usual way we have an associated joint distribution P(j,k) = P{j)Q{k\j). 
We express the average distortion as 

d(Q) = ^p(j)g(fc|j)rf(j,fc), (3.1) 

where d{ , ) denotes the distortion measure. A conditional probability assignment Q{k\j) is said to be D -admissible if and 
only if d{Q) < D. The set of all _D-admissible conditional probability assignments we denote by 

Qd = {Qik\j) : d{Q) < D}. (3.2) 

Along with an average distortion d{Q), we also have an average mutual information 

i{Q) = Y^p{m{kmog[9^]. (3.3) 

Then for fixed the rate distortion function is defined as 

R{D)^ vam I{Q). (3.4) 

Observe that if a parameter s represents the slope of the function R{D) at a point {Ds,Rs) generated parametrically, we 
have R'{D) = s (this is not exactly trivial: see Berger, 1971, Theorem 2.5.1). 



3.3 Meaningful paths 

More formally, a pattern of sensory input is mixed in an unspecified but systematic algorithmic manner with a pattern 
of internal ongoing activity to create a path of combined signals x = (ao, Oi, . . . , a„, . . .). Each aj. thus represents some 
functional composition of internal and external signals. Wallace (2005) provides some neural network examples. 



5 



This path is fed into a highly nonhncar, but otherwise similarly unspecified, decision oscillator, h, which generates an 
output h{x) that is an element of one of two disjoint sets Bq and Bi of possible system responses. Let 



Bi — bk+i, . . . , bjn. 



Assume a graded response, supposing that if 
the pattern is not recognized, and if 



h{x) g Bo, (3.6) 



h{x) £ Bi, (3.7) 



the pattern is recognized, and some action bj,k + l<j<m takes place. Such oscillators may be influenced by 'forcing' when 
a signal is subjected to some impulse such that its frequency, and hence the response, adjusts accordingly with respect to that 
applied impulse. More familiar oscillating physical models react to this by exhibiting 'beats' and 'resonance' for instance. 

The principal objects of formal interest are paths x which, through information flow, trigger pattern recognition-and- 
response. That is, given a fixed initial state oq, we examine all possible subsequent paths x beginning with oq and leading to 
the event h(x) S Bi. Thus h{ao, . . . ,aj) G Bq for all < j < m, but h{ao, . . . , Um) G Bi. 

For each positive integer n, let N{n) be the number of high probability grammatical/syntactical paths of length n which 
begin with some particular oq and further leading to the condition h{x) S Bi. These are paths of combined signals as above, 
that are structured to some language. For short, we call such paths 'meaningful', assuming, not unreasonably, that N{n) will 
be considerably less than the number of all possible paths of length n leading from ap to the condition h{x) e Bi. 

One critical assumption which permits an inference on the necessary conditions constrained by the asymptotic limit 
theorems of information theory, is that the finite limit 

^.^ log[iV(n)]^ (3g) 

n )-oo Ji 

(the 'uncertainty') both exists and is independent of the path x. The rate distortion principle applies as follows (Wallace, 
2005): the restriction to meaningful sequences of symbols increases the rate at which information can be transmitted with 
arbitrary small error, and that the grammar/syntax of the path can be associated with a dual information source. Here we 
may assume a typical information source X to be 'adiabatic', 'piece- wise stationary' and 'ergodic' (APSE), and that a system 
engaging in a cognitive process is describable as such. We list here the explanations: 

(1) 'Adiabatic' means that the changes are slow enough to allow the necessary limit theorems to function. 

(2) 'Stationary' means that between pieces the probabilities hardly change, and 'piecewise' means that these properties 
hold between phase transitions which are described using renormalization methods (see Wallace, 2005). 

(3) 'Ergodic' means that in the long term, correlated sequences of symbols are generated at an average rate equal to their 
(joint) probabilities. 

More specifically, the essence of 'adiabatic' is that, when the information source is parametrized according to some appropriate 
scheme, within continuous 'pieces' of that parametrization, alterations in parameter values occur slowly enough so that the 
information source X remains as close to stationary and ergodic as needed to put to work the fundamental limit theorems 
of information theory. In view of p.Sp . the Shannon uncertainty of X can be stated more specifically by (see e.g. Cover and 
Thomas, 1991): 

H[X] = hm iHlM. (3.9) 



3.4 The fundamental homology 

We recall how the information source uncertainty was defined as in equation p.8p . This is quite analogous to the free energy 
density of a physical system, equation (15. 2p . and the relevance to a cognitive process can be explained by the following steps. 
For instance, Feynman (1996) provides a series of physical examples (based in part on the research of C. H. Bennett into the 
thermodynamics of computing (Bennett, 1982) where this homology is, in fact, an identity, at least for very simple systems. 
Bennett (1982) argues, in terms of idealized irreducibly elementary computing machines, that the information contained in 
a message can be viewed as the work saved by not needing to recompute what has been transmitted, or as Feynman (1996) 
puts it: the information contained in a message is proportional to the amount of free energy density needed to erase it. The 
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essential argument is that computing, in any form, takes work. Thus the more comphcated a cognitive process, measured 
by its information source uncertainty, the greater its energy consumption, and our abihty to provide energy to the brain is 
Umited: typicahy, a unit of brain tissue consumes an order of magnitude more energy than a unit of any other tissue. 

The less information available to us concerning an event, the higher is its entropy, and information retrieved is not without 
a cost in expenditure of energy, where 'cost' may be interpreted as the necessary number of bits needed to encode a message. 
The thermodynamic minimum of energy in terms of bits of information is kBT\og2 e erg/bit (= fcsT erg/nat). So efficiency 
in an information system is essentially when there is the minimum amount of energy expended in retrieving information. 
Specifically, if F is taken to denote the free energy, then taking A is to denote the minimum number of nats/sec, the efficiency 
of the system is given by 77 = ksTP'^K (see e.g. Berger, 1971). 

In a similar spirit to Bennett's work, Li and Vitanyi (1992) consider the thermodynamic costs of computation and how 
certain thermodynamic considerations can give a recursively invariant notion of 'cognitive distance' using a kind of billiard 
dynamics approach. In this case a minimal cognitive distance between two objects will correspond to the minimal amount 
of work expended for a given cognitive transformation of objects, either by some computational procedure or by some 
neurocognitive function of the brain. A higher descriptive level leading to more complex and protracted algorithms, then 
leads to greater Kolmogorov complexity (Li and Vitanyi, 1993). As a particular application of the Bennett /Feynman ideas 
in the Global Workspace setting, Wallace (2007) argues that the cognitive disorder of 'inattentional blindness' emerges as 
a thermodynamic limit on processing capacity in a topologically-fixed global workspace, i.e. one which has been strongly 
configured about a particular task. Institutional and machine generalizations seem clear. 

4 Dynamic groupoids and their atlases 
4.1 Concept of a groupoid 

Many cognitive processes exhibit the patterns of dynamical systems (see e.g. Glazebrook and Wallace, 2009a). In such systems 
one aims to unify the internal and external symmetries, and to be able to reduce vast myriad-like network configurations 
into manageable schemes involving the corresponding equivalence classes analogous to those already mentioned in source 
encoding/decoding, etc. in i j3.ll (see also ij6. 21 below). A precise way of doing this lies within the categorical concept known 
as a groupoid (see Brown, 2006; Connes, 1994; Weinstein, 1996). In essence a groupoid G consists of both a set of objects X 
and a set of morphisms, or 'arrows', each of which project to an object in X , and all such morphisms admit an inverse. 

Remark 4.1. The most familiar example of a groupoid, as known to students of algebra, is that of a 'group' where there 
is a single object ('the identity'). Hence groupoids can be viewed as extensions of the 'group' concept to sets of multiple 
identities thus providing a wide scope of applications to the dynamics of neurocognitive and socio-bioinformatic systems (see 
e.g. Baianu et al., 2006; Glazebrook and WaUace, 2009a, 2009b; Golubitsky and Stewart, 2006; Stewart et al., 2003; Wafiace 
2005; Wafiace and Fullilove, 2008). 

A groupoid can be depicted by 

a,/3 : G==£X (4.1) 

where the groupoid morphisms (a, (3) onto objects, are called the range and source maps, respectively. Informally, the groupoid 
represents a feature of built in reciprocity between its algebraic structures, internalizing and externalizing the prevailing 
symmetries. The morphisms a, (3 satisfy certain algebraic relations of associativity, existence of two-sided identities, etc. 
(details can be seen in e.g. Brown, 2006; Connes, 1994; Weinstein, 1996). A groupoid can here be understood in relationship 
to a linkage by a meaningful path of an information source dual to a cognitive process for which the underlying principle is 
that: states aj,ak in a set A are related by the groupoid morphism if and only if there exists a high probability grammatical 
path connecting them to the same base point, and the tuning across the various possible ways in which that can happen - the 
different cognitive languages - parametrizes the set of equivalence relations and creates the groupoid. 

Example 4.1. Since we have already mentioned equivalence classes in the context of source encoding/decoding, it seems 
appropriate to see how an equivalence relation TZ defined on (a set) X takes shape as a groupoid. Here we have the two 
projections a, (3 : TZ — >X, and a product {x,y){y,z) = {x,z) whenever {x,y),{y,z) e TZ together with an identity, namely 
{x,x), for each x & X. Moreover, the essential equivalence relations (classes) derived from a systems space (network) arise 
from the orbit equivalence relation of some groupoid G acting on that space (see e.g. Weinstein, 1996). In the context 
of connected (sub)networks/graphs which can reduced to equivalence classes, natural groupoid structures come about in 
accordance with equivalence classes of relations TZ{xy), as above, that is simply interpreted as having an edge linking node x 
to node y. Conversely, a groupoid (of equivalence relations) admits an underlying graph structure via its implicit scheme of 
objects and morphisms between objects (for details, see e.g. Brown, 2006). Thus we have the two-way associations whereby 
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'objects' can be identified with 'nodes', and 'morphisms' identified with 'edges' in groupoids (of equivalence relations) and 
networks, respectively: 

-, equivalence relation ^ . , 

Network =4> Groupoid 

underlying graph . 

N etwork ■<== Groupoid 

4.2 Groupoid atlases 

An important observation about mult i- tasking in institutional and distributed cognitive systems concerns how the various 
submodules interact. When a given subnetwork is represented in its groupoid form, then such interactions naturally can be 
realized in terms of groupoid actions, a topic that warrants further attention. Then we would like to see the overall cognitive 
system in terms of such interacting groupoids and designed by an 'atlas' of the latter, and one, as shown in Glazebrook and 
Wallace (2009a), containing the representation of several possible emergent 'giant components' induced by the outcome of 
local group(oid) actions within the Workspace (see Figure 3). A workable concept seems to be that of a groupoid atlas (Bak 
et al., 2006) which provides a schematic representation for coupling interactions between multi-agent systems and uses a 
pasting together of the local dynamic groupoid actions with the net effect of a 'global' groupoid. 

One commences from a family of dynamically interacting groupoids (Ga) — {Gi, G2, . . .} where each groupoid has the 
same set of objects; this family is called a single domain or multiple groupoid. A groupoid atlas is then defined as a set with 
a covering by patches each of which comprise a single domain with global action, representing the local processing which 
is then globalized across the atlas. This is a desirable effect and one particularly suited to logically inscribing processors 
or sensors (the 'agents') within the cognitive modules of the Workspace. As a descriptive mechanism, this atlas has the 
advantage of admiting a weaker structure compared with that of a conventional manifold since no condition of compatibility 
between arbitrary overlaps of the patches is necessary. This is a key property relevant to the structure of cognitive modules 
that can be geared to equivalence class representations where flexibility in the structure is a natural characteristic. In this 
way, the atlas provides a convenient description of a web of complexity representing the dynamic reciprocity of tightly-knitted 
functional systems as was applied to small world networks (Glazebrook and Wallace, 2009a). Time and space does not permit 
including a mathematical outline of the construction; for the technical details we refer the reader to Bak et al. (2006) and 
del Hoyo and Minian (2009). However, we intend to apply this concept with some essential details in §5.31 below. 



5 Information and the Onsager relations 

5.1 A fundamental homology and the Onsager relations 

The information supported interactive cognitive modules we have in mind are assumed to possess their own internal 
metabolisms and mechanisms of self-organization as reflective of vital biochemical processes. Just as for the latter, the 
evolution of these 'cognitive cells' is characterized by reactive states of thermal nonequilibrium, in accord with the laws of 
thermodynamics, and the capability to assimilate information which we study in respect of source uncertainty. This we 
can achieve by applying the Onsager relations of nonequilibrium thermodynamics (Kurzynski, 2006; Landau and Lifshitz, 
2007). The reasoning starts by observing how a fundamental homology between the information source uncertainty dual to a 
cognitive process and the free energy density of a physical system can arise mainly from the formal similarity between their 
definitions in the asymptotic limit. 



5.2 The Groupoid Free Energy Density 

Recall that for a thermodynamic state of a given system at fixed temperature T with energy E and entropy S, the free energy 
density F is defined to be 

F^E-TS. (5.1) 

In the Hamiltonian formulism one takes the volume V and the partition function Z{K) derived from the system's Hamiltonian 
at inverse temperature K (Kurzynski, 2006; Landau and Lifshitz, 2007). The free energy density is then defined to be 

F[K] ^ hm 

^ (5.2) 
= hm where Z^Z-*. 

V >oo V 

Consider now an information source Hq^ over a corresponding groupoid Gq. The probability of Hq^ is given by: 

P(gcJ ^ J (5.3) 
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Network 1 ^ ■ 



^ Groupoid Action — 




Figure 3: Displayed are groupoids Gi, G2, G3 derived from the equivalence relations of their respective networks of multi-agent 
type cognitive modules. They are viewed as the components of a groupoid atlas. The groupoid actions are indicated by dotted 
arrows, in this case, between Gi and G2, and may represent the formation of network linkages as the system is shifted via 
crosstalk, and for instance, how a 'giant component' emerges (see e.g. Glazebrook and Wallace, 2009a). This is a descriptive 
mechanism that could be seen, for instance, as enveloping the underlying networks of the 'reactive' system in Figure 1, and 
is applicable to the content of ^5.31 
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where the normalizing sum is over all possible subgroupoids of the largest available symmetry groupoid. Now let 



ZG = ^exp[-iJGj. (5.4) 

a 

The groupoid free energy density ( GFE) of the system Fq at inverse normalized equivalent temperature K is defined as 

F^[K] = -^\og[Zo{K)]. (5.5) 

With each such groupoid Gq, of the (large) cognitive groupoid, we can associate a dual information source Hq^ . We recall the 
rate distortion function between the message sent by the cognitive process and the observed impact, while noting that both 
Hq^ and R{D) may be considered as free energy density measures. In a sense, R{D) constitutes a sort of 'thermal bath' for 
the process of cognition. Then the probability of the dual information source can be expressed by 

^ '"^"E,exp[-JTc,/Ki?(^))r] ' ^^-^^ 

where k denotes a suitable dimensionless constant characteristic of the system in the context of a fixed machine response 
time T. The sum is over all possible subgroupoids of the largest available symmetry groupoid. Accordingly, the term R{D)k 
represents a 'rate distortion energy', in this case, a kind of 'temperature analog'. In the context of a fixed r, a decline in 
R{D) (on increase in average distortion), acts to 'lower the machine temperature', driving it to more simple and less less rich 
behaviors. 



5.3 A groupoid atlas of information sources 

The groupoids Gc can indeed be taken to comprise a groupoid atlas A on an appropriate set Xj^. In Bak et al. (2006) this 
is motivated by considering a group G in the standard algebraic sense, and a family 

{{GA)a rx {Xa)^ : a e 

= {(G^)„ X {Xa)^^{Xa)^ : a G ^a}, ^ ' ' 

of group actions 'rv' on subsets {XA)a C Xa, where the local groups ((jyi)a and the corresponding subsets {XA)a are 
indexed by an indexing set ^a called the coordinate system of A which is seen to satisfy certain conditions (Bak et al., 
2006). Now the family of local groups can be replaced by a family of local groupoids {Ga) defined with respective object sets 
(X>i)„, and with a coordinate systems ^a that is equipped with a reflexive relation denoted by <. This data is to satisfy 
the following conditions (Bak et al., 2006): 

(1) If a < ^ in ^A, then {XA)a H {Xa)^ is a union of components of {Ga), that is, if a; € {XA)a H {Xa)^ and g G {GA)a 
acts as g : x — >y, then y G {XA)a n {Xa)^- 

(2) If a < ^ in '^a-, then there is a groupoid morphism defined between the restrictions of the local groupoids to intersections 

(t>i : {GA)a\{XA)a H (X^)/3^(G^)^| (X^)„ H {Xa)p, (5.8) 

and which is the identity morphism on objects. 

Thus each of the Gq with its associated dual information source Hq^ constitutes a component of an atlas which incorporates 
the dynamics of an (inter)reactive system through information sources, by means of the intrinsic (groupoid) actions. For 
simplicity, let us refer to this atlas as A. Suppose we have another such atlas 25 representing a separate system which can be 
related to A via a suitable transformation. To make matters precise we consider then a morphism / : A — >'B prescribed by 
a triple (X/, $/, G/) satisfying (del Hoyo and Minian, 2009): 

(1) Xf : Xa — >X's is a set-theoretic function. 

(2) $/ : ^A — ^^-B is a function that preserves the relation <. 

(3) Gf : Ga — >G'b is a (generalized) natural transformation of groupoid diagrams over the function $/ which restricts to 
Xf on objects. 
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These conditions can summarized in the following straightforward way. For each a, a fmrction Gj^^j : Gq — >G^f{a) is given, 
such that for objects Obj(G/(a)) ~ Xf\Xa, and ii a < j3, the diagram 



G 



*/(") 



(5.9) 



-> G 



is commutative. 



5.4 Biochemical data compression 

Further motivation is provided by considering phases as chemically and thermodynamically homogeneous when formally 
compared with average mutual information such as applied to living systems that are capable of assimilating and using 
information reaped either from their environment or from information that is intrinsic to their particular system via genetic 
characteristics. Here we reflect in part upon the theme of H3.4I in which we discussed certain physical means by which 
information runs at the cost of expending free energy. On course with the fundamental homology, it befits us to consider 
some physical equations that are homologous to the variational calculus of the rate distortion function, and follow to some 
extent Berger (1971, §6.4). The idea is that when a living information system is confronted with an environment, it reacts 
towards it at an atomic-molecular level. Given then some thermodynamic system, let njk be the number of atomic weights 
of substance j that end up in phase k, and let rij be the number of atomic weights that were introduced originally, so that 
rij = X^fc^-jfe- The multiphase (chemical) equilibrium problem involves determining the Ujk- To proceed, let us express the 
free energy as F — Fi + F2, where 

Fi = ^njfcCjfc, 

j.k , ^ 

where the Cjk are the free energy constants, and where the term contained in the logarithm is the chemical potential of the 
reactant j in phase k. Let now n = uq + ■ ■ ■ + tim-i (for some M), Pj = rij /n, Qk\j — nju/rij and Qk = ^jkjn, then 



F, 



2 



^ = ^1 + ^2 = n ^ P,Qu\, [c, + log [^] \-nY,P, log P,. (5.11) 



Since in principle the n and Pj can be determined, the minimization of F reduces to minimizing the double summation on 
the right side of (|5.11[) . Letting djk = —Cjk/s, then 'nature' selects the rijk so as to minimize the quantity 

V^J2 P^QklA^og [^] - sd,k] ■ (5.12) 

The crucial observation here is that (|5.12p is formally the same as 

V = /(g)-sd(g), (5.13) 

in other words, the function to be minimized is the difference between the average mutual information and s times the average 
distortion. This is minimized by an appropriate choice of Q = Q{k\j) such as to determine a point on the R{D) curve where, 
as in !j3.21 we have R'{D) = s. The multiphase chemical equilibrium is a local process; the overall system itself may in general 
remain in a state far-from-equilibrium. 

According to Berger (1971), 'nature' then automatically performs the analogous minimization in multiphase chemical 
equilibrium, and when an information system encounters an environment, the reaction on the molecular level follows these 
principles. Thus the free energy of the combined system and the environment is minimized, although this is at the cost of 
the system's free energy capacity where there will inevitably be some heat dissipation. Eventually the system can fine-tune 
its coding mechanism and re-configures itself for the task of gaining more specialized knowledge about its environment, and 
thanks to newly acquired information, it may engage the latter to its advantage along with detecting other drifting cells of 
information. How such optimizing procedures can be realized in an evolutionary context, is of course one of the main tasks 
of applying rate distortion arguments (cf Wallace and Wallace, 1998, 1999, 2008, 2009). For instance, a scenario studied in 
Tlusty (2007, 2008) concerns how the genetic code maps the 64 nucleotide triplets (codons) to 20 amino acids. In terms of 
this mapping, the code is viewed as a noisy information channel and the claim is that evolutionary characteristics determine 
the emergence of the code via appropriate selection of amino acids which minimise the risk of errors, and subsequently the 
code emerges at a 'supercritical' phase transition once the mapping ceases to be random. 
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6 The Onsager relations in the context of information 



6.1 The basic equations 

Understanding the time dynamics of cognitive systems away from phase transition critical points thus requires a phenomenol- 
ogy similar to the Onsager relations. If the dual source uncertainty of a cognitive process is parametrized by some vector of 
quantities K = {Ki, . . . , Km), then, in analogy with nonequilibrium thermodynamics, the gradients in the Kj of the disorder, 
defined as 

m 

S = H{K) - ^ Kj dH/dKj, (6.1) 

become of central interest. Equation ()6.ip is similar to the definition of entropy in terms of the free energy density of a 
physical system, as suggested by the homology between free energy density and information source uncertainty described 
above. Pursuing the homology further, the generalized Onsager relations defining temporal dynamics become 

dKj/dt = Lj, dS/dK„ (6.2) 

i 

where the (kinetic coefficients) Lji are, in first order, constants interpreted as reflecting the nature of the underlying cognitive 
phenomena. The partial derivatives dS/dK are analogous to thermodynamic forces in a chemical system, and may be subject 
to override by external physiological driving mechanisms as shown in (Wallace, 2005; Wallace and FuUilove, 2008) along with 
further extensions of these dynamical procedures. 

Remark 6.1. Equation ()6.2p is 'general' in the sense that we do not necessarily assume the symmetry condition Lji = Lij 
which in this latter case expresses Onsager's 4-th law of thermodynamics (see e.g. (3.7) in Kurzynski, 2006). The matrix 
L = [Lij] is to be viewed empirically, in the same spirit as the slope and intercept of a regression model, and may have a 
structure far different when compared to the more basic, more familiar chemical or physical processes. Generally, information 
sources are notoriously one-way in time, as exemplified by the patent linguistic scarcity of palindromic structures that do 
actually make some sense. 



6.2 Equivalence classes of information sources 

Equations (j6.1l) and (16. 2p can be derived in a simple parameter-free covariant manner which relies on the underlying topology 
of the information source space implicit to a process. Different cognitive phenomena have, according to our development, 
dual information sources, and we are interested in the local properties of the system near a particular reference state. We 
impose a topology on the system, so that, near a particular 'language' A, dual to an underlying cognitive process, there is (in 
some sense) an open set U of closely similar languages A, such that A and A are subsets of U. Note that it may be necessary 
to coarse-grain the system's responses to define these information sources. The problem is to proceed in such a way as to 
preserve the underlying essential topology, while eliminating 'high frequency noise'. The formal tools for this can be found 
in e.g. Burago et al. (2001). 

Since the information sources dual to the cognitive processes are similar, for all pairs of languages A, A in U, it is possible 
to make use of the following: 

(1) Create an embedding alphabet which includes all symbols allowed to both of them. 



(2) Define an information-theoretic distortion measure in that extended, joint alphabet between any meaningful (i.e. high 
probability grammatical/syntactical) paths in A and A, which we write as d{Ax,Ax) (Ash, 1990; Cover and Thomas, 
1991). Note that these languages do not interact, in this approximation. 



(3) Define a metric on U, for example. 



/, J d{Ax, Ax) 

^^^'^^ = \'^ tU^x,Ax) -'\^ ^'-'^ 



using an appropriate integration limit argument over the high probability paths. The usual metric properties apply as 
in Burago et al. (2001). 

Note that these conditions can be used to define equivalence classes of languages, where previously we defined equivalence 
classes of states which could be linked by meaningful paths to some base point. This led to the characterization of different 
information sources from which a formal topological manifold, which is an equivalence class of information sources, can 
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be constructed. As a working hypothesis we may assume this to be a standard difFerentiable manifold in which the set 
of such equivalence classes generates the dynamical groupoid (cf Glazebrook and Wallace, 2009a, 2009b), and then study 
those mechanisms, internal or external, which can break that groupoid symmetry. In particular, the imposition of a metric 
structure on this groupoid, and on its base set, would permit a nontrivial interaction between orbit equivalence relations and 
isotropy groups, leading to interesting algebraic structures. 

Since H and M. are both scalars, a 'covariant' derivative can be defined directly as 

dH/dM ^ hm m^m, (6.4) 
A^A M{A,A) 

where H{A) is the source uncertainty of language A. Suppose the system is set in some reference configuration ^o- To obtain 
the unperturbed dynamics of that state, impose a Legendre transform using this derivative, defining another scalar 

S = H-MdH/dM. (6.5) 

The simplest possible Onsager relation - here seen as an empirical, fitted, equation like a regression model - in this case 
becomes 

dM/dt^ LdS/dM, (6.6) 

where t is the time and dS/dAi represents an analog to the thermodynamic force in a chemical system (cf H5.4p . Relevant here 
are patterns of oscillatory- like behavior where a weak signal is amplified by the presence of noise as result of some synchronized 
hopping around local extrema. The standard terminology for this phenomenon is stochastic resonance (Gammaitoni et al., 
1998) and we will proceed to give some idea of how this can be related to certain types of cognitive processes. Since we are 
working with stochastic differential equations, the first step is to modify the equation of thermodynamic force accordingly. 
To this extent equation (|6.6p is rewritten as 

dM/dt = LdS/dM + (jW{t), (6.7) 

where cr is a constant and W{t) represents a white noise term. Again, the quantity S is seen as a function of the parameter 
M.. This leads directly to a family of classic stochastic differential equations expressed as differential 1-forms 

dMt = L{t, dS/dM) dt + cr(i, dS/dM) dBt, (6.8) 

where L and a are appropriately regular functions of t and M, and dBt represents the noise structure. 

Such cognitive-epigenetic systems which are driven by stochastic and noise driven diffusion processes, may be suitably 
conditioned to admitting further noise perturbations that lead to a degree of stochastic resonance capable of amplifying a 
relatively weak signal or actually reducing the level of randomness in the system. Such resonance may function as a catalyst 
towards the system's self-organization and complexity, in the same way as open systems far-from-equilibrium require internal 
amplification in order to reach a macroscopic dynamical structure (Gammaitoni et al., 1998; Prigogine, 1980; West et al., 
2005). 

6.3 Rate distortion dynamics 

Recall that the rate distortion function R{D) defines the minimum channel capacity necessary for the system to have an 
average distortion < Z), thus imposing a limit on the information source uncertainty and suggesting how distortion measures 
can drive information system dynamics. In other words, R{D) affords a homological relation to free energy density, very 
much along the lines of the above relation between free energy density and information source uncertainty. Accordingly, it is 
proposed that the dynamics of cognitive modules interacting in characteristic real-time t will be constrained by the system 
as described in terms of R{D), but now we generalize matters as in Wallace and Wallace (2008) by producing a vector-valued 
function i?(Q) where in the vector Q = [Qi, . . . , Qk) the first component is defined to be the average distortion, and then 
(cf ^) 

m 

Sr = i?(Q) -J2Q^ dR/dQ,, (6.9) 
1=1 

which leads to the deterministic and stochastic systems of equations analogous to the Onsager relations of nonequilibrium 
thermodynamics 

dQj/dt = dSn/dQ,, (6.10) 



13 



together with 

dQi^U{Qi,...,Qk,t) dt + Y,'^'HQi,---,Qk,t) dBl (6.11) 

i 

where the dBl represents often highly structured stochastic noise whose properties may be described in terms of Brownian 
motion and quadratic variation (see e.g. Kunita, 1990; Protter, 1995). 

At this stage we introduce several examples for which part of the purpose will be to motivate introduction of the Ito 
principle which we will do so below. 

Example 6.1. Firstly, for a simple Gaussian channel with noise having zero mean and variance a^, we have 

Sr = R{D) - DdR{D)/dD = i log(fTV^) + ^. (6.12) 
The simplest possible Onsager relation becomes 

dD/dt = -iidSn/dD = (6.13) 

in which the term —dSn/dD represents the force of an 'entropic wind' which is a kind of internal dissipation inevitably driving 
the real-time system of interacting cognitive information sources toward greater distortion. Equation (|6.13|) has a solution 
D = showing in this case that the average distortion increases monotonically with time. Following Wallace (2009a, 

§7.2), this example shows that such a system will inevitably succumb to a relentless entropic force, requiring a constant free 
energy expenditure for maintenance of some fixed average distortion within the system 's communication between them. The 
distortion in this case will, without free energy input, have a time dependence D = f{t), with f{t) montonically increasing in 
t, eventually leading to the punctuated failure of the system. Further, in the Einstein diffusion equation, a straightforward 
argument of Wallace (2009a, §7.2)shows that the standard deviation of the particle position increases in proportion to fit. 
Thus, whereas we do not expect the high correlations of an information source to exhibit typical Brownian motion, it does 
seem to be the case that the distortion in communication between the interacting cognitive modules within the appropriate 
context of the Onsager relations, does display Brownian motion which may be of bounded variation in certain cases. 



6.4 Rate distortion coevolutionary dynamics 

Here we consider different cognitive developmental subprocesses of gene expression characterized by information sources Hm 
interacting through chemical or other signals, and assume that different processes become each other's principal environments 
which is a suitable hypothesis within a broad coevolutionary context. Let 

i/„ = i?™(i^i, . . . . . . . . .), (6.14) 

where the Ks represent other relevant parameters, and j ^ m. We regard the dynamics of this system as driven by a recursive 
network of stochastic differential equations. Letting the Kj and Hm all be represented as parameters Qj (with the caveat 
that Hm does not depend on itself), we follow the generalized Onsager formulation of Wallace and Wallace (2009), in terms 
of the equation 

5™ = i/„ - ^ g, dHm/dQ,, (6.15) 

i 

to obtain a recursive system of phenomenological Onsager relations^ in terms of a system of stochastic differential equations 

dQl = J2 • ■ • ' dS"'/dQ\ ...)dt + aj,{t, dS"'ldQ\ . . .) dB]], (6.16) 

i 

in which, for ease of notation, both the terms Hj and the external Kj's are expressed by the same symbol Qj. As m ranges 
over the Hm we could allow different kinds of 'noise' dBl, having particular forms of quadratic variation which may represent 
a projection of environmental factors within the scope of a rate distortion manifold (Glazebrook and Wallace, 2009b). 

The next step in extending (16.16^ is to bring in rate distortion functions for mutual crosstalk between a set of interacting 
cognitive modules by using the homology of R{D) itself. To this extent, consider different cognitive processes indexed 1, . . . , s, 
and take the mutual rate distortion functions Rij characterizing communication (and distortion) between them. At the same 
time the essential parameters remain the characteristic time constants of each process, Tj, for 1 < j < s, together with an 
overall embedding free energy density F. Taking the Q" to run over all the relevant parameters and mutual rate distortion 
functions (along with the distortion measures Dij), then (|6.15p now takes shape as 

St - R^o - Yl ^R^oldQk, (6.17) 

k 
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and accordingly (|6.16l) becomes 

dQ^= J2 [Lp{t,...,dS^JdQ^,...)dt + ap{t,...,dS^JdQ^,...)dB^]. (6.18) 

/3={ij} 

This last equation generalizes the treatment in terms of crosstalk, its distortion, the inherent time constants of the different 
cognitive modules, and the overall available free energy density. 

Example 6.2. For a Gaussian channel and fixed embedded communication free energy density F representing the richness 
of incoming information from the interacting cognitive modules, we extend (j6.13p to 

dD/dt = J^-f^Pi a>0, (6.19) 

that represents the communication distortion between the modules. The equilibrium solution is -Dcquii — jaF- "^^^ difference 
between (|6.13l) and (j6.19p is that whereas in the former case, the distortion grows directly as the square root of the elapsed 
time, equation (16.19^ reveals there is a finite, equilibrium, average distortion that is inversely proportional to the available 
environmental or informational free energy that the interacting systems can implement in order to navigate their actions. 

The above situation can be generalized to -Dcquii = 'g(F)' '^-'i^re g{F) is monotonically increasing in F. On introducing a 
characteristic response time variable r, so that 

dD/dt ^^~giF)h{T), (6.20) 
where /i(t) is also monotonically increasing, leads to 

This example reveals that given a fixed rate of available information free energy, the increasing allowable response time 
decreases average distortion in the interaction. 

Example 6.3. Suppose now that feedback is allowed so that the system actively seeks information in proportion to the 
distortion between intent and impact, then the Onsager relation for a Gaussian channel becomes 

dD/dt = ^ - g{F)h{T)D, (6.22) 

^-' = (2^(^)^' ^'-''^ 

which is significantly smaller than (16.211) , and is effectively the classic result for Brownian motion in a harmonic central field 
(e.g. equation (54) of Wang and Uhlenbeck, 1945). 



and 



6.5 Multidimensional Ito process 

Together with the multiphase equilibrium problem of H5.4I we have so far pursued a theme of how the stochastic simulation 
of biochemical systems closely parallels that of evolutionary-genetic systems. An initial observation here, and one that will 
motivate further ideas, is that a stochastic differential equation of the type (j6.18p should in principle model the dynamics 
of large, intricate networks that are constrained by the costs of actual computational time, which seems relevant in certain 
senses to many epigenetic processes. The general setting for such processes often involves that of an Ito stochastic DE (viz 
ltd process) and this is how we intend to view (|6.18|) as a kind of 'master equation'. 

For the readers sake, we remark why such a level of mathematical formality is necessary by recalling a basic difficulty: the 
sample paths ijf of Brownian motion are not in general functions of bounded variation, so that dB^ is not defined as for that 
of the usual Riemann-Stieltjes integral. One may start by supposing that for almost all samples, Qt in (|6.18p is independent 
of future Brownian motion B!^ — ijf , u > t; otherwise said Qf is an adapted stochastic process. Putting it another way, the 
information available at a given time includes the history of the process at that time. More generally, the stochastic process 
may be taken to be predictable in order to define the Ito integral of the equation (see e.g. §2.3 of Kunita, 1990 and Appendix 
[5] here). 

On the other hand, there is part justification for making assumptions of (local) boundedness of variation in the Brownian 
motion incurred via distortion, and in particular, the adapted condition, points to those examples and observations already 
quoted that concern the dispensation of available free energy. Firstly, in view of Example 16.11 we may expect a constant 
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free energy expenditure for maintenance of some fixed average distortion in communication between the interacting cognitive 
modules. Secondly, given a fixed rate of available information free energy, the increasing allowable response time decreases 
average distortion in the interaction (Example I6.2p . and thirdly, the possibility that finite, equilibrium, average distortion 
is actually inversely proportional to the available environmental or informational free energy that the interacting systems 
can utilize (Example 16.31) . However, to make matters precise, it is appropriate to consider the formalities of some filtered 
probability space (i7,F,P), where F — {Ft '■ t > 0} (see Appendix [8]) and postulate a multidimensional stochastic process 
given by the Ito integral of the master equation ()6.18p : 

Qt^Qo+ E [f Lp{s,...,dS^JdQ'',...)ds+ f ap{s,...,dS^^/dQ",...)dB^]. (6.24) 

In order to state the conditions for which this process is well-defined, we first express (I6.24[) in the simplified form 

Q?^Q^ + A" {t)+ f dB^ . (6.25) 
Jo 

Then following Harrison (1985, Chapter 4): 

(1) Qq is measurable with respect to _fo- 

(2) is an adapted stochastic process, and P{Jq o'|(s) ds < oo} — 1, for all t > 0. 

(3) The integral of 'drift', A°'(t) = /„ Lp ds, is a continuous and adapted variation-finite (VF) process. 

Granted that our observations about the local boundedness of the (rate distortion) Brownian motion as essentially fulfilling 
these conditions, we now have an explicit stochastic model for the role of cross-talk, its distortion, the inherent time constraints 
of the different cognitive modules, as well as the overall available free energy density, where the Q parameter structure 
represents the full-scale fragmentation of the system in the presence of some Weiner noise. 

Remark 6.2. A further possible generalization of (I6.18P is to introduce into that expression a matrix valued function 
Va : R™ — !>M"^" to describe the intrinsic reaction rates (cf Manninen et al., 2006): 

dQt^ E [L0{t,...,dS^JdQ",...)dt + Vpap{t,x...,dS^JdQ'',...)dB^]. 

/3={y} 



6.6 The stochastic flow 

Towards the possibility of a stochastic fiow generated by (j6.18p (such as a Brownian fiow of diffeomorphisms) , we opt to 
simplify (|6.18p accordingly. Following Kunita (1990) we write (|6.18p in a simplified form as 

m 

dQt^Y.'^Qt ^ ^o{Qt,dSR/dQ,t) dt + Y,fk{Qt,dSR/dQ,t) dB^ (6.26) 

a k=l 

Typically, we would seek a solution starting from some x at some time s; let us call these solutions Qs,t- Then a stochastic 
fiow can be represented as the solution of a stochastic differential equation of the type 

Q,,t{x) + F{Q,,r{x),dSR/dQ, dr), (6.27) 
where in this case, Brownian motion F(x, dSu/dQ, t) valued in vector fields, is given by 

nt rn „t 

F{x,dSB,ldQ,t)^ / fo{x,dSR/dQ,r) dr + y2 fk{x,dSR/dQ,r) dB'^. (6.28) 
Ja Jo 

The formal conditions for (I6.27P to produce a Brownian flow of diffeomorphisms are (Kunita, 1990): 

(1) Qs,t{x) is continuous in s,t,x. 

(2) The map 

QsA^) ■ — ^K"^ (for suitable d) is a diffeomorphism for any s < t. 

(3) Qs,u = Qt,uiQs,t), for any s < t < u. 
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Note that conditions (l)-(3) are 'almost everywhere (surely)' conditions. Also, such a flow generates a holonomy or geometric 
phase (transition) which can be explained by the process of tracking internal states in relationship to a spatiotemporal 
orientation. In more precise differential-geometric terms, holonomy results from the parallel transport of vectors around a 
closed path, thus leading to a representation of the space of the latter into a group of global symmetries. The procedure 
for constructing a holonomy groupoid associated to this flow concerns some mathematical technicalities, but is nevertheless 
standard (see e.g. Connes, 1994; Moerdijk and Mrcun, 2003). Whereas this case is fairly well tempered, we would in general 
expect 'singularities' in the flow. The holonomy groupoid can be still be constructed, but this involves deeper mathematics 
outside of the scope of this paper (see e.g. Debord (2001) for details). 

7 Discussion and conclusions 

We have given here a descriptive account of cognitive modules as components of epigenetic-evolutionary systems (as far-from- 
equilibrium open systems) embedded within the context of environment and culture. Using dynamic groupoids of network 
equivalence classes we have put into atlas form the various constituent (inter)reactive systems based on rate distortion 
principles of the Shannon theorems and the groupoid free energy density. The rate distortion function R{D) determines 
a channel capacity that is measurable in an analogous way to a free energy that regulates many of the (inter)reactive 
/reciprocating processes that have been described. Rate distortion arguments suggest that if an external information source 
is pathogenic, then sufficient exposure to it within a developmental stage will likely result in a image inscribed on mind 
and body in a punctuated fashion, subsequently causing a developmental dysfunction. In this analogy, the reduction of the 
R{D) amounts to 'lower temperature' which in turn directs the system to behavioral patterns which are less enriched and 
arc less complex. Further, accurate and efficient communicating systems require a greater channel capacity, and keeping 
in mind the analogy with free energy density, a higher rate of metabolism is necessary and further costs are incurred (cf 
§3.4|) . Failure to provide such resources equates to a decline in processing, possibly to a point of disintegration. On the other 
hand, increased communication between the system's cognitive modules, depending on the availability of free energy, will 
usually be followed up by a phase transition (in essence, this is what the holonomy groupoid encodes as shown in Glazebrook 
and Wallace (2009a)) inducing further complexity into the systems behavior. In principle one might envisage an associated 
holonomy groupoid atlas for interacting cognitive-dynamical systems based on synchronous (geometric) phase transitions 
of the various constituents. This would amount to a broad-scale descriptive artifact for the purpose of understanding the 
cumulative transitional mechanisms of living processes that may eventually uncover some even deeper conceptual issues. 

Stochastic processes are perhaps more in keeping with what the world expects compared to a strictly deterministic 
approach, though the former are likely to entail higher computational costs and some neuroscientific work in this area is 
aimed at reducing such costs (cf Manninen et al., 2006). Integration of the Onsager stochastic differential equation towards 
a multidimensional Ito process, leads naturally to a stochastic flow which in our formulation diffuses across the atlas through 
mainly noisy channels (cf i 35.4|) . Such evidence is provided by Tlusty's rate-distortion analysis of the genetic coding map 
where it is shown that the evolution of the code unfolds as a smooth flow on the codon space as realized in Tlusty (2007). 
With such examples in mind, we have presented here a novel development, very much in tune with the "flow" nature of living 
systems as once envisaged by Bertalanffy and Prigogine, while at the same time our approach using rate distortion principles 
embraces many central features of related theories of cognition such as those proposed in Atlan and Cohen (1998), Baars 
(1988), Baars and Franklin (2003), Cohen and Harel (2007) and Maturana and Varela (1980a, 1980b). 



8 Appendix: Probability space and Brownian motion 

We state some basic details as to be found in Harrison (1985), Kunita (1990) and Protter (1985). Let O be a set. A collection 
J- of subsets of f2 is called a a-field if it contains an empty set and it is closed under the operations of countable unions and 
complements. It is customary to call a measurable space in which members of D, are called samples and those of 

are called events. Let P be a cr-additive measure on (fi, J^). It is called a probability if P{fl) — 1. The triple (£7, J^, P) is 
then called a probability space. Let F = {Tt : i > 0} be a family of a-algebras on ft such that a) ^ for all t > 0, and 
b) Ts C Tt, if s < t. Then F is said to be a filtration (an increasing sequence of sub-cr-algebras) of (fi, J^). The filtration 
F characterizes how information arises (how uncertainty is resolved) and may be interpreted as the set of all events 
whose occurrence or nonoccurrence will be determined at time t. In a filtered probability space it is usually understood that 
T — T 

A stochastic process W = {Wt}, t E T, is said to be adapted (relative to (f2,F, P)) if Wt is measurable with respect to 
J-f., for all t > 0. Loosely speaking, this means that the information available at time t includes the history of W up to that 
point. The predictable a-field is the least cr-field in the product space [0, T] x for which all continuous Pf-adapted processes 
are measurable. A predictable process is then defined as a process that is measurable with respect to the predictable c-field. 
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An example is a continuous Jt-adapted process. Recall that a process of random variables W = {Wt},t e T is Brownian 
motion (viz a Weiner process) if and only if 

(1) Wq — with probability 1. 

(2) For < s < t < oo the increment Wt — Wg is normally distributed iV(0, \t — s\). 

(3) For < to < ti <■■■< tn < 00, the set of increments 

{Wt„Wt,-Wt,-i,ioTl<j<k}, 

is a set of independent random variables (that is, the increments are independent of the past). 

A process Q is called a (/i, a) -Brownian motion if it has the form Qt = Qo + l^t + aWt where is a Weiner process and Qo 
is independent of W. Then we have Qt+s — Qt N{iis, a^s). 

9 Appendix: Basic results of information theory 

9.1 The Shannon uncertainties 

Invoking the spirit of the Shannon-McMillan Theorem, it is possible to define an APSE information source X associated with 
stochastic variates Xj having joint and conditional probabilities P(ao, . . . , an) and P(a„|ao, . . . , a„_i) such that appropriate 
joint and conditional Shannon uncertainties satisfy the classic relations 

[X] = hm »M 

n — >-cso fi 

= lim H{Xn\Xo,...,Xn-{) (9.1) 

^ ^.^ if(Xo, . . . ,Xn) 
n — ^oo 77, 

This information source is defined as dual to the imderlying ergodic cognitive process (Wallace, 2005). 

Recall that the Shannon imcertainties if(. . . ) are cross-sectional law-of-large-numbers sums of the form — log[Pfe], 
where the Pfe constitute a probability distribution (for the basic details, see Ash, 1990; Berger, 1971; Cover and Thomas, 1991; 
Khinchin, 1957). Messages from an information source, seen as symbols Xj from some alphabet, each having probabilities 
Pj associated with a random variable X, arc 'encoded' into the language of a 'transmission channel', a random variable Y 
with symbols , having probabilities P^ , possibly with error. Someone receiving the symbol then retranslates it (without 
error) into some Xfc, which may or may not be the same as the Xj that was sent. More formally, the message sent along the 
channel is characterized by a random variable X having the distribution 

P(X = x,) = P,-,j = l,...,M. (9.2) 
The channel through which the message is sent is characterized by a second random variable Y having the distribution 

P(y = 2/fc) = Pfe,fc = l,...,L. (9.3) 
Let the joint probability distribution of X and Y be defined as 

P{X = Xj,Y = vk) = P{xj,yk) = Pjk, (9.4) 
and the conditional probability of Y given X as 

P{Y = yk\X = xj) = P{yk\xj). (9.5) 

Then the Shannon uncertainty of X and Y independently and the joint uncertainty of X and Y together are defined 
respectively as 

M 



if(X) = -^P,log(P,), 

L 

H{Y) = -J2Pklog{Pk), (9.6) 



fe=i 

M L 

H{X,Y) = -Y,Y.Pj'klog{Pjk). 

j=i k=i 
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The conditional uncertainty of Y given X is defined as 

M L 

H{Y\X) = - ^^P,,.log[P(2;fe|x,)]. (9.7) 

j=i fe=i 

For any two stochastic variates X and Y , we have the inequality H{Y) > H{Y\X), as the knowledge of X generally gives 
some knowledge of Y. Equality occurs only in the case of stochastic independence. Since P{xj,yk) = P{xj)P{yk\xj), it is 
deduced 

H{X\Y) ^ H{X, Y) - H{Y). (9.8) 

The information transmitted by translating the variable X into the channel transmission variable Y - possibly with error - 
and then retranslating without error the transmitted Y back into X , is defined as 

I{X\Y)^H{X)~H{X\Y) 

= H(X) + H{Y)-H{X,Y), ^ ■ ' 

where we refer to Berger (1971), Cover and Thomas (1991) and Khinchin (1957) for details. The essential point is that if 
there is no uncertainty in X given the channel Y , then there is no loss of information through transmission. In general this 
will not be true, and herein lies the essence of the theory. 

9.2 The Rate Distortion Theorem 

Following Wallace (2005), suppose we have an (ergodic) information source Y with output from a particular alphabet 
generating sequences of the form 

2/" = 2;i,...,2/n (9.10) 
'digitalized' in some sense, and induce a chain of 'digitalized' values 

fo" = foi,...,&„ (9.11) 

where the 6-alphabet is considered more restricted than the y-alphabet. In this way, 6" is deterministically retranslated into 
a reproduction of the signal y" . That is, each 6" is mapped onto a unique n-length y-sequence in the alphabet of Y: 

b"'^r ^yi,---,yn. (9.12) 

We remark that many sequences may be mapped onto the same retranslation sequence y", the set of which is denoted Y; 
this may be interpreted as a loss of information. 

A distortion measure d : Y x Y — between paths y" and y" is defined as 

n 

d{y\r)^-Y.^{y,,yj)^ (9.13) 

i=l 

for some suitable distance function d (such as the Hamming distance). Suppose that with each path y" G F and each 
&"-path retranslation y" e Y into the y-language, we consider the associated individual, joint, and conditional probability 
distributions 

p(y") , p{r) , P(y"|y")- (9.14) 

The average distortion is then defined to be 

I? = ^p(y") d(y",y"). (9.15) 

J/" 

For the corresponding strings Y (incoming), Y (outgoing), applying the Shannon uncertainty rule of (|9.9|) gives 

IiY,Y)^HiY)-HiY\Y) 

= H{Y)+H{Y)-H{Y,Y). 

The information rate distortion function R{D) for a source sequence Y, retranslated sequence Y with distortion measure 
d -.Y X Y — )-R+, is defined as follows. Let T = J2{y y) P{y) p{y\y) d{y,y). Then 

R{D)= ^(^'^)- (9-17) 
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To explain this notation, the minimization is over all conditional distributions p{y\y), for which the joint distribution p{y, y) = 
P{y) P{y\y) satisfies average distortion less than or equal to D. The Rate Distortion Theorem (see e.g. Bergcr, 1971; Cover 
and Thomas, 1991) states that R{D) is the minimum necessary rate of information transmission (effectively the channel 
capacity) so that the average distortion does not exceed the distortion D. 
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